Goto

Collaborating Authors

 improved technique


Improved Techniques for Training Score-Based Generative Models

Neural Information Processing Systems

Score-based generative models can produce high quality image samples comparable to GANs, without requiring adversarial optimization. However, existing training procedures are limited to images of low resolution (typically below 32 x 32), and can be unstable under some settings. We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets. To enhance stability, we also propose to maintain an exponential moving average of model weights. With these improvements, we can effortlessly scale score-based generative models to images with unprecedented resolutions ranging from 64 x 64 to 256 x 256. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets, including CelebA, FFHQ, and multiple LSUN categories.


Improved Techniques for Training GANs

Neural Information Processing Systems

We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: Our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.


Review for NeurIPS paper: Improved Techniques for Training Score-Based Generative Models

Neural Information Processing Systems

Weaknesses: The experimental section does not quite have the experiments I was hoping for. I was hoping to understand *which* techniques were important to scale the model to higher-resolution images. I know that using all techniques are needed for the model to learn LSUN images, but my suspicion is that only a subset of them (perhaps only EMA) is needed for regular NCSN to learn high-resolution images. Better understanding of which techniques are important would help explain what is needed for scaling the model. Moreover, for the ablation experiments provided, it does not seem that all 5 techniques are need for all datasets. Fig 5 seems to show that the simplified network is not needed on CelebA 64x64 as FID is better using the original network.


Review for NeurIPS paper: Improved Techniques for Training Score-Based Generative Models

Neural Information Processing Systems

Some minor concerns were raised about some missing details and the way the results are presented, but the reviewers agree that the author feedback addresses most of these concerns satisfactorily. R2 points out that the sensitivity of FID to noise has been observed previously in the literature. I recommend that the authors take this into account when they update the manuscript.


Reviews: Improved Techniques for Training GANs

Neural Information Processing Systems

The results presented in the paper are impressive and significant enough. However, the results are quite empirical, non-conclusive, and lack of theoretical justification. For rebuttal, please focus on answering the (*), (**), and (***) mentioned in the following paragraphs. Reviewer is willing to change score if all the questions are well addressed. Novelty: The techniques proposed in the paper is novel in general. However, the proposed technique "feature matching" when training GAN has been explored to some extent: -- Generating Images with Perceptual Similarity Metrics based on Deep Networks by Dosovitskiy and Brox -- Autoencoding beyond pixels using a learned similarity metric by Larsen et al.


Improved techniques for deterministic l2 robustness

Neural Information Processing Systems

Training convolutional neural networks (CNNs) with a strict 1-Lipschitz constraint under the l{2} norm is useful for adversarial robustness, interpretable gradients and stable training. However, their performance often significantly lags behind that of heuristic methods to enforce Lipschitz constraints where the resulting CNN is not provably 1-Lipschitz. In this work, we reduce this gap by introducing (a) a procedure to certify robustness of 1-Lipschitz CNNs by replacing the last linear layer with a 1-hidden layer MLP that significantly improves their performance for both standard and provably robust accuracy, (b) a method to significantly reduce the training time per epoch for Skew Orthogonal Convolution (SOC) layers ( 30\% reduction for deeper networks) and (c) a class of pooling layers using the mathematical property that the l{2} distance of an input to a manifold is 1-Lipschitz. Using these methods, we significantly advance the state-of-the-art for standard and provable robust accuracies on CIFAR-10 (gains of 1.79\% and 3.82\%) and similarly on CIFAR-100 ( 3.78\% and 4.75\% across all networks.


Improved Techniques for Training Score-Based Generative Models

Neural Information Processing Systems

Score-based generative models can produce high quality image samples comparable to GANs, without requiring adversarial optimization. However, existing training procedures are limited to images of low resolution (typically below 32 x 32), and can be unstable under some settings. We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets. To enhance stability, we also propose to maintain an exponential moving average of model weights. With these improvements, we can effortlessly scale score-based generative models to images with unprecedented resolutions ranging from 64 x 64 to 256 x 256. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets, including CelebA, FFHQ, and multiple LSUN categories.


Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs

Zheng, Kaiwen, Lu, Cheng, Chen, Jianfei, Zhu, Jun

arXiv.org Artificial Intelligence

Diffusion models have exhibited excellent performance in various domains. The probability flow ordinary differential equation (ODE) of diffusion models (i.e., diffusion ODEs) is a particular case of continuous normalizing flows (CNFs), which enables deterministic inference and exact likelihood evaluation. However, the likelihood estimation results by diffusion ODEs are still far from those of the state-of-the-art likelihood-based generative models. In this work, we propose several improved techniques for maximum likelihood estimation for diffusion ODEs, including both training and evaluation perspectives. For training, we propose velocity parameterization and explore variance reduction techniques for faster convergence. We also derive an error-bounded high-order flow matching objective for finetuning, which improves the ODE likelihood and smooths its trajectory. For evaluation, we propose a novel training-free truncated-normal dequantization to fill the training-evaluation gap commonly existing in diffusion ODEs. Building upon these techniques, we achieve state-of-the-art likelihood estimation results on image datasets (2.56 on CIFAR-10, 3.43/3.69 on ImageNet-32) without variational dequantization or data augmentation. Code is available at \url{https://github.com/thu-ml/i-DODE}.


Improved Techniques for Training GANs

Salimans, Tim, Goodfellow, Ian, Zaremba, Wojciech, Cheung, Vicki, Radford, Alec, Chen, Xi, Chen, Xi

Neural Information Processing Systems

We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: Our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes. Papers published at the Neural Information Processing Systems Conference.


How to Implement the Inception Score (IS) for Evaluating GANs

#artificialintelligence

Generative Adversarial Networks, or GANs for short, is a deep learning neural network architecture for training a generator model for generating synthetic images. A problem with generative models is that there is no objective way to evaluate the quality of the generated images. As such, it is common to periodically generate and save images during the model training process and use subjective human evaluation of the generated images in order to both evaluate the quality of the generated images and to select a final generator model. Many attempts have been made to establish an objective measure of generated image quality. An early and somewhat widely adopted example of an objective evaluation method for generated images is the Inception Score, or IS.